17 research outputs found
スケッチ問い合わせを用いた文書画像内容検索
この博士論文は、全文公表に適さないやむを得ない事由があり要約のみを公表していましたが、解消したため、令和2(2020)年4月20日に全文を公表しました。筑波大学 (University of Tsukuba)201
A novel shape descriptor based on salient keypoints detection for binary image matching and retrieval
We introduce a shape descriptor that extracts keypoints from binary images and
automatically detects the salient ones among them. The proposed descriptor operates as
follows: First, the contours of the image are detected and an image transformation is used to
generate background information. Next, pixels of the transformed image that have specific
characteristics in their local areas are used to extract keypoints. Afterwards, the most salient
keypoints are automatically detected by filtering out redundant and sensitive ones. Finally,
a feature vector is calculated for each keypoint by using the distribution of contour points
in its local area. The proposed descriptor is evaluated using public datasets of silhouette
images, handwritten math expressions, hand-drawn diagram sketches, and noisy scanned
logos. Experimental results show that the proposed descriptor compares strongly against
state of the art methods, and that it is reliable when applied on challenging images such as
fluctuated handwriting and noisy scanned images. Furthermore, we integrate our descripto
Educational video classification by using a transcript to image transform and supervised learning
In this work, we present a method for automatic topic classification of educational videos using a speech transcript transform. Our method works as follows: First, speech recognition is used to generate video transcripts. Then, the transcripts are converted into images using a statistical co-occurrence transformation that we designed. Finally, a classifier is used to produce video category labels for a transcript image input. For our classifiers, we report results using a convolutional neural network (CNN) and a principal component analysis (PCA) model.
In order to evaluate our method, we used the Khan Academy on a Stick dataset that contains 2,545 videos, where each video is labeled with one or two of 13 categories. Experiments show that our method is effective and strongly competitive against other supervised learning-based methods
Sketch-Based Image Retrieval By Size-Adaptive and Noise-Robust Feature Description
We review available methods for Sketch-Based Image Retrieval (SBIR) and we discuss their limitations. Then, we present two SBIR algorithms: The first algorithm extracts shape features by using support regions calculated for each sketch point, and the second algorithm adapts the Shape Context descriptor [1] to make it scale invariant and enhances its performance in presence of noise. Both algorithms share the property of calculating the feature extraction window according to the sketch size. Experiments and comparative evaluation with state-of-the-art methods show that the proposed algorithms are competitive in distinctiveness capability and robust against noise
A modular approach for query spotting in document images and its optimization using genetic algorithms
Query spotting in document images is a subclass of Content-Based Image Retrieval (CBIR) algorithms concerned with detecting occurrences of a query in a document image. Due to noise and complexity of document images, spotting can be a challenging task and easily prone to false positives and partially incorrect matches, thereby reducing the overall precision of the algorithm. A robust and accurate spotting algorithm is essential to our current research on sketch-based retrieval of digitized lecture materials. We have recently proposed a modular spotting algorithm in [1]. Compared to existing methods, our algorithm is both application-independent and segmentation-free. However, it faces the same challenges of noise and complexity of images. In this paper, inspired by our earlier research on optimizing parameter settings for CBIR using an evolutionary algorithm [2][3], we introduce a Genetic Algorithm-based optimization step in our spotting algorithm to improve each spotting result. Experiments using an image dataset of journal pages reveal promising performance, in that the precision is significantly improved but without compromising the recall of the overall spotting result
An Application-Independent and Segmentation-Free Approach for Spotting Queries in Document Images
We report our ongoing research on an application-independent and segmentation-free approach for spotting queries in document images. Built on our earlier work reported in [1][2], this paper introduces an image processing approach that finds occurrences of a query, which is a multi-part object, in a document image, through 5 steps: (1) Preprocessing for image normalization and connected components extraction. (2) Feature Extraction from connected components. (3) Matching of the query and document image connected components' feature vectors. (4) Voting for determining candidate occurrences in the document image that are similar to the query. (5) Candidate Filtering for detecting relevant occurrences and filtering out irrelevant patterns. Compared to existing methods, our contributions are twofold: Our approach is designed to deal with any type of queries, without restriction to a particular class such as words or mathematical expressions. Second, it does not apply a domain-specific segmentation to extract regions of interest from the document image, such as text paragraphs or mathematical calculations. Instead, it considers all the image information. Experimental evaluation using scanned journal images show promising performances and possibility of further improvement
A comparative study using contours and skeletons as shape representations for binary image matching
Contours and skeletons are well-known shape representations that embody visual information by using a limited set of object points. Both representations have been applied in various pattern recognition applications, while studies in cognitive science have investigated their roles in human perception. Despite their importance has been shown in the above-mentioned fields, to our knowledge no existing studies have been conducted to compare their performances. Filling this gap, this paper is an empirical study of these two shape representations by comparing their performances over different binary image categories and variations. The image categories include thick, elongated, and nearly thin images. Image variations include addition of noise to the contours, blurring, and size reduction. The comparative evaluation is achieved by resorting to object classification (OC) and content-based image retrieval (CBIR) algorithms and evaluation metrics. The main findings highlight the superiority of contours but the improvements observed when skeletons are used for images with noisy contours
Towards a segmentation and recognition-free approach for content-based document image retrieval of handwritten queries
We introduce a method for content-based document image retrieval (CBDIR) of handwritten queries that is both segmentation and recognition-free. We first demonstrate that our method is underpinned by a theoretical model that exploits the Bayes' rule. Next, we present an algorithmic implementation that takes into account real world retrieval challenges caused by handwriting fluctuations and style variations. Our algorithm operates as follows: First, a number of connected components of the query are matched against the connected components of the document image using shape features. A similarity threshold is used to select the connected components of the document image that are most similar to the query components. Then, the selected components are used to detect candidate occurrences of the query in the document image by using size-adaptive bounding boxes. Finally, a score is calculated for each candidate occurrence and used for ranking. We conduct a comparative evaluation of our method on a dataset of 200 printed document images, by executing 40 printed and 200 handwritten queries of mathematical expressions. Experimental results demonstrate competitive performances expressed by P-Recall = 100%, A-Recall = 99.95% for printed queries, and P-Recall = 73.5%, A-Recall = 57.92% for handwritten queries, outperforming a state-of-the-art CBDIR algorithm